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ABSSRACI 




This report is concern with developing a methodology to 
be followed when attempting to design a tightly coupled, highly 
reliable microprocessor based computer system. The concept 
of Triple modular Redundancy (TlnR) with Sparing is usea. The 
notion of syncroni?.ing by using a single crytal oscillator is 
is examined. The use of decoders to replace voters is also used. 

The decoders not o'lly isolate the failed module by also allows 
error identification to be accomplished. Each module is tc 
have its own RAM memory. The necessary circuitry to select a 
correct memory and the corresponding DMA controller has been designed. 
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Introduction 


The work in this pro^ject involves using oommercially avail- 
able microprocessors in an environment which requires a highly 
reliable microcomputer system. The project specifically addresses 
synchronization of microcomputer systems and the organization and 
development of a redundant system with sparing. The Intel 8085 
system design kit was used to form the basis of the discussions 
within this report. However, the concepts projected in this report 
can be applied to any of the commerically available microprocessor 
based systems. 

It is assummed that commerically available microcomputer systems 
will meet the specifications as stated by their manufacturer. These 
conditions are not as strigent as space-flight requirements. Con- 
sequently, the failure rate will be signif icantly higher. V»'e do not 
attempt to calculate that failure rate. Rather, emphasis has been 
placed on timely, reliable response to a failed module. ’ 

Much attention was given to checking in real time with the use 
of hardv/are to perform as many checking functions as possible. By 
using hardware, the system can run at the designed speed until an 
error occurs. If or v;hen an error does occur, hardware recovery 
assures the designers of the fastest and most accurate recovery. 

What has been attempted here is the development of a very 
tightly coupled Triple Modular Rundundancy (TMR) with Sparing. 

A system designed using this philosophy will run exceptionally 


fast. Errors will te detected in a minimum amount of time and 
the recovery time will also be minimized. She result is a system 
which is not slowed down by the checking features and in most 
instances microcomputer modules can be sv;apped out in a real time 
environment with a minimum effect on real time operations. 
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Approach 

A triple modular redundancy (TMR) v;ith sparing system normally 
has three basic computer modules which are being compared constantly 
to detect errors. The errors are detected by voting on the addresses, ‘I 
data and status information. If an error does occur, the module thats ;i 
in error is replaced by one of the spares. ] 

Our approach Involves using three microcomputer modules in the i 

basic TMR configuration and three spares (Figure 1). Addresses and || 

data are checked constantly. Every bus transfer is observed and 
checked. The data selectors of Figure 1 are responsible for the | 

ti 

selection of microcomputer modules in the TMR configuration. The | 

decoders detect error and identify which module is in error. If j; 

* |( 

the system is not in an critical I/O operation when an error occurs, |{ 

the control network will first store the state of the computer l! 

I' 

; 

network. The failed computer module is taken out of the network I 

by changing the code on the data selector, hext, the error j 

i 

|l 

detector is deactivated to allow the RAM memory to be updated. I 

f 

One of the correct microcomputer modules is selected. It's | 

content is assummed to be correct and is broadcasted to all 
RAM memory. V/e note that the state of the system is in RAH and 
is transferred along with other data. The module counter now 
points to spare to be activated. This number is transferred to 
the data selector that handles the module that was in error. 

Finally, a vector interrupt is forced by the controller causing 
all microcomputer modules to reload the state and begin to 
run. The controller will go back to look for another failure. 
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System Partitioning and Configuration 

She configuration of the system Kill deteriLlne where and 
how the TMB and sparing can he invoked into an overall reliable 
system, ‘i'he main factors which determine the overall configuration 
are the number of I/O devices, the amount of code that is 
written for the system, the amount of connected memory, and the 
amount of data space req,uired for the system to function properly. 

Ihe S085 is an example of a very large scale integrated (V'ISI) 
chip. A minimum configuration (Figure 2) for basic control functions 
consists of the processors, pn I/O and a read-only memory (ROM) 
chip, and a random access jmemory (RAM) and 1/0 chip. Ihe 8085 
processes the infor?iUtion which it obtains from either the ROM 
or the RAM memory and the I/O chips. The ROM memory contain 
code which is permanently needed to run the system. The system we 
refer to has a keyboard and display chip v^hich is ueed to input 
control functions and data, and output register and memory values. 

The RAM & 1/0 chip has two functions. The first is to hold the 
user's program and temporary data. The second purpose is to 
interface with external devices. 

The total memory for a microcomputer system consist of both 
ROM and RAH memory. The program and data siae will determine 
how the TMR microcomputer will be designed. There are several 
choices. The first choice is to have a separate memory for each 
microprocessor based system. If v;e configure the s;-’stem in this 
manner, it will be necessary to change memories when a processor 
malfunctions. But the new memory will have to be updated so far as 








Figure 2. Minimum System Configuration (Intel) 








data is concerned. It is obvious that ve wanu to keep the ratio 
of ROM to RAM as high as possible so that recovery time is as short 
as possible. 

A variation on this scheme is to allow data to be written to .* 
all RAM memories whether that RAM is currently a part of the basic 
2MR configuration or a part of one of the spares. By writing to 
all memories we can decrease the recovery time because updating would 
be eliminated. The disadvantage of this method is all RAM has been 
running for the same time and thus is just as likely to be faulted 
as the system that it is replacing. 

Another choice of configurations is to have three RAMs triplicated 
such that we vote on the correct output when data is read from these 
memories. If ah error is encountered, it will be masked by the 
values of the two correct memories. 

Checking I/O operations presents a special problem because 
the I/O chips are connected to the bus on one side and to the 
I/O devices on the other. Once data is transmitted to the I/O 
chip, we assume that the I/O chip is performing properly. Data !i 

transferred into the processor can be checked by the error 
detecting network because this data is on the main general 
purpose bus. V/e can triplicate the output of the I/O chips. 

This vjould ensure that the I/O devices are receiving the correct .1 

data. i 

I 

At this point, we see that \ie can be certain that our system is 

i 

performing properly simply by triplicating memories and memory i 

i 

trunsfers, and by triplicating the output of the I/O chips. We j 

must do the same for the bus v/hich interfaces all these devices. ! 
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However, the amount of added logic far exceeds that of the basic 
system configuration. This implies that if a fault should occur, 
it is likely to be in the checking portion as opposed to the' 
basic computer system. In an attempt to obtain an optimum con- 
figuration, it was decided to check transfers mi the main data 
and address busses only. Each microcomputer module will have its 
own memory. The memory will be divided into ROM and RAM. The 
RAM will be as small as possible, so that recovery will be minimal. 

The configuration decided upon is shown in Figure 1 . Three 
of the microcomputer modules form the basic TMR configuration. 

The remaining three are spares. The solid line interconnecting 
the processors to the data selectors represent the address and 
data buses. The number of actual lines depend on processor size 
and the bus configuration. Typically, there are sixteen address 
lines and eight data lines. In the case of the 8085, there are 
sixteen total lines because the lower address and data lines are 
multiplexed on the same pins. If the system is designed such that 
the address and data lines are demultiplexed between the processor 
and the remainder of the system, then we have a normal address 
and data bus. Otherwise, the checking network must be responsible 
for checking when addresses are valid and when data is valid. 

As can be seen from Figure 1 , the selection of which micro- 
computer module is currently in the TMR configuration is done 
by the data selectors. The number of control lines determine 
how many spares the system can have. A two-to-four line selector 
can have three spares. A three-to-eight line selector can have 
seven spares. Therefore, the number of spares is two to the n 
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minus one, where n is the number of control lines for the selector. 

^ The output of the selectors is fed into uhe ’ifters and error 

detector network. This section is responsible for the identifica- 
tion of the failed microcomputer module and the generation of the 
j data signals. The error recovery network is responsible for the . 

sequencing of control information if an error does occur. This 
network will interrupt the microcomputer modules and cause them 
I to store current state information. 

The Next Available Spare (NAS) counter will be incremented and 
this information will be directed to the data selectors which is 
) in error. This causes another processor to be selected. The 

TMR Controller will thsii cause the memories of the active processor 
modules to re-write themselves. During this period, the error detector 
) is ignored but the voter is relied upon to bring the new memory 

up to date. 

Finally, the TMR Controller will cause the microcomputer modules 
> in the active processor configuration to restore the state. At this 

point, the processing can continue and the Error Detector can look for 
another fault. 

I 

f 
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Synchronization 


One of the first problems to be solved is that of synchronization. 


It may or may not be a problem. It depends on how the checking is to 


be done on the microcomputer modules. It will also depend on what is 


to be checked. Let'^s look at some possible checking schemes. One 


approach is to check data after a given number of instructions have 


been executed. There are several advantages to this scheme; 1) 


timing is not as critical because we can simply count op-code fetch 


cycles until we are ready to test data, 2) microcomputer modules can 
be slightly out of sync during an interval, and 3) the checking will 


be minimized because we only use a few cycles to check. The dis- 


advantages are; 1 ) an error may occur during the timing period that 
will go undetected until a checking cycle is entered, 2) incorrect 


output data or status can alter handshaking and data flow between 


I/O devices and the microcomputer module, and 3) to make this scheme 


efficient, there must be restrictions on the programmer and tn 


software that is written for the system. 


Another scheme is to only check when data is being transferred. 


The philosophy here is "until incorrct data or status is trans- 


mitted beyond the system, no fault has occurred." The disadvantage 


of this philosophy is that the memory may be completely degraded by the| 


time the fault is discovered. One solution to this problem is to 


this problem is to triplicate the memory so that incorrect data can 


not degrade the system. Of course this means that more logic has to 


to be added to the system > 


It was decided that minimizing the logic is the best approach. 
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A parts count was the determining factor that led to this decision, 
j If bhe number of_ components is increased beyond that of the system, 

then the checking network is more likey to be in error than is the 
original microcomputer module. Consequently , it was decided that the . 
I best approach was to check address and data transfers only, hore 

precisly, all addresses and data transfers to and from the processors, 
memories and I/O devices are checked. 

I Checking all data transfers implies a tight coupling between 

all microcomputer modules as well as the checking network. It 
also means that all microcomputer modules must be executing the 
} same code in perfect sync with one another. Hence, it was necessary 

to devise a scheme where only one clock is used for the entire 
TMR system including the sparring modules as well as the voting 
I system. 

By observing the timing mechanism of Figure 2, we can see another 
problem. The clocking circuit for microprocessors has been placed 
I inside the chip with the logic so that the chip count can be reduced. 

Synchronized timing can be obtained by the intelligent use of external 
timing control lines such as READY, RESET-III, or HOLD. 

An attempt to use the READY input to synchronize the micro- 
processors was attempted. The circuit of Figure 3 was used to hold 
the READY line of all processors until each was in the op-code fetcli 
state. V/hen a processor enters the op-code fetch state the status 
lines lO/M, S1 and S2 are in state (011). This causes the first one- 
shot to fire changing the state of the op-code fetch flip-flop. When 
all other processors have reached the same state, the second one-shot 
will fire. If the single-step switch is inactive, the RDY signal will 

I 




Figure 3. A Synchronization Technique 
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■be sent xo all processors. Che op-code fetch flip-flop is reset after’ 
a short delay and v/aits for the next instruction to go to the op-code 
fetch state. 

i 

In observing an oscilloscope, all processors appeared to be in 
perfect sync. However, v/e were not satisfied with this results because 

{ 

the start-up conditions were unknown. It v/as decided to see 
what would happen if the crystal oscillator of the first micro- 
processor was placed across the clock circuit of all micro- 
processors at the same time. With a counter attached to the 
CIiK-OUIT (clock-out) signal, we measured the variation in frequency. 

Sable 1 shows the' result of attempting to synchronize by using j 

a single crystal oscillator. As can be seen, the frequency will i 

decrease somewhat as more processors are connected. ‘ I 


ft of Processors 


Measured Frequency 
3.07437 Khz 


5.07300 Khz 
3.07259 Khz 
3.07225 Mhz 


Sable 1. Frequency Variation of Single Crystal Technique. 





Error Definition 
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Once we are sure that a group cf processors are executing 
the same code at the same time, the next problem is to define what . 

is considered to be an error. Since the bus is being monitored 
to determine if a fault has occurred, then addresses, data, and 
status must be continually observed. Addresses may be generated 
by the processors. Data may be generated by the processors, 
memory or input devices. Status may be generated by the processor 
or input devices. In the case of the Intel 8085, the status lines 
are WRITE, READ, lO/H, INTA, reset-out and the processor state 

i 

lines S1 and S2. 


In our study, research as limited to commerical microprocessors. 
Intel 8085 System Design Kits (SDK-85) obtained to form the basis of 
this study has the extra vboard space on the kits which allowed a TMR 
network to be placed in close proximy to the system. The size of the 
board also limited the error detecting to the addresses and the data 
only. This was due to the limited space for adding busses necessary 
for mlnitoring. 

Errors can exist on address, data and status lines. In 
developing a TMR system, we must know when to check for errors. 

By observing the instruction cycle. Figure 4, we find that it is 
divided into a number of machine cycles namely, the opode-fetch, 


i? 

I 

I 

i 


memory read cycle, memory write cycle, I/O read cycle, 1/0 write 
cycle, interrupt acknowledge cycles, and the bus idle machine 
cycles. All instructions consists of at least an opcode fetch 
cycle . 
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Figure 4. Typical Instruction Cycle. 


Let's look at a typical instruction — store register. 

First the processor will enter the opcode fetch cycle in which 

the contents of the program counter is placed on the address 

bus. The READ command will be issued and the memory will respond 

by placing the opcode on the data bus. Next we enter two memory 

* 

read cycles, one for the least significant address byte and one 
for the most significant address byte for the storage of the 
register content. In each of these memory read cycles, Figure 6a, 
the contents of the program counter will be placed on the address 
bus, the READ command will be issued and the memory will place a byte 
on the data bus. Finally, the processor will enter a memory 
write cycle (Figure 6b). The address obtained in the two memory read 
cycles will be placed on the address bus, the WRITE command will be 
issued and the contents of the register will be placed on the 
data bus . 

The ultimate goal in this project is to detect errors instan- 
teously and correct the system by bring in a new spare as soon 
as possible. This means every address ana every data transfer 
must be checked. By observing the Figures 5, 6a, 6b, and 7 we see 
that addresses are valid whenever the ALE line is active and data is 
valid whenever ViRITE, READ, or INTA is active. These lines 
can be used to derive the times for checking addresses and data. 

There are limitations on the size of the error detecting 
network based on the speed of the microprocessor. The 8085 
uses a 6.144 megahertz (Khz) crystal. This frequency is 
divided by a factor of two to obtain a clock out signal of 
3.072 Khz. The period of this clock cycle is 325*5 nanoseconds 
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ins). From Figurs 5 ve ses that addresses are valid on one 
clock cycle and data is valid on the very next clock cycle. Shis 
means that the checking must be accomplished in one clock cycle 
time period (525.5 ns) or less, or the system has to be slowed by 
using 8 READY signal. 

Should an error occur, the system must store the state of the 
processors, update a new memory, and restore the state before a 
spare can replace a failed microprocessor module. This vrill require 
many machine cycles and hence the required time to perform this 
operation is greater than 525*5 ns. Therefore, the computer 
system must be temporary stopped vrhsn a nev; spare is to be made 
active . 

The question aries, "\vhat if an error occurs during a critical 
timed I/O operation?" This condition can be handled by the 
hardware of the Error Recovery System and the software generated by 
the programmer for this cyetem. Prior to entering a critical 
timed I/O operation, the software programmer can disable the 
Error Recovery until the system exits the critical timing period. 

The addresses and data used during this period can be the "voted" 
addresses and data. 
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2MR Controllar 

She key to the entire design is the 2KH controller. Its purpose 
is to 1 ) brin,? the microcomputer modules up in a SMR configuration 
by selecting three modules as the heart of the system, 2) allow the 
system to continue to run until an error is detected, >) disal3,ow 
any interruptions during a critical I/O operation, 4) detect and 
identify failed modules, 3 ) cause all modules to save their state of 
operation when an error is discovered, &) select a correct module's 

t 

memory to copy and write the RAM section of the new processor, 8) 
restore the state of operation after the memory update, and 9) 
start looKing for another error. 

fhe state diagram for the 2HR controller is shown in Figure 8. 
After the system is reset, the TMR controller will set the processor 
selectors to 00. Thus, the controller observes the buses of the three 
main microcomputer modules. The controller then goes to a RUI< state. 
The controller will remain in this state until either there is a 
critical I/O operation or an error is discovered. 

If a critical I/O operation is to take place the controller 
will set the Critical I/O flip-flop (CIO). When this operation 
is complete, the CIO flip-flop will be reset. 

■ If an error is detected, the controller will exit the HUI’ 
state and the CIO flip-flop will be tested. If the CIO flip-flop 
is set, the error detect and error recovery circuits will be dis- 
abled. The controller will continue to RUN with and error until 
the CIO flip-flop is reset. V/hen this happens, the controller 
will issue a vector interrupt to all active modules. The modules 
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Figure 8. TMR Controller State Diagram. 






stores all pertinent information regarding the present condition 
of the task. The failed module may have completely deteriorated 
by this time but the remaining good modules will have accurate in- 
formation in their state vector. 

The TMR controller must now select a correct module and update 
the memory of the new module to be brought into the main TKR con- 
figuration or a correct memory can be read and the information 
broadcasted to all memories of all processors. Along with the in- 
formation transfer will be the state vector of the correct processor. 

The next available spare (NAS) counter will be incremented 
to select the next module to be brought into the main configuration. 
The command is issued to restore the state of the machine. 

However, when this happen all processors have the same state 
vector. Finally, the THR controller will go back to the RUN 
state and look for new errors. 

The TMR controller block diagram is shown in Figure 9. 

The purpose of the Error Decoder is two-fold. First this 
component will be responsible for detecting error and second, 
it will identify which microcomputer module is in error. The 
diagram of Figure 9 shows the Error Decoder as having only a 
few output lines. In actuallity the lines shown must be multiplied 
by the number of address, data and status lines to be checked. 

Since only one error line and three ID lines s.re required, the 
concentrator circuits of Figure 9 are required. The ALE, ’..■RITE, 
and READ lines are required to establish timing signals. These 
signals tell the TMR controller when addresses and data is valid. 
Since \ie do not know' which ALE, V/RITE, or READ signal w'ill be 














valid, it is necessary to concentrate all of them ana vote 
on when data is valid. 

The new processor selector, (NFS) function is to keep track 
of which processor is next to be brought into the THE network. 

This is accomplished by using a clock and a counter. The counter 
is enabled only when there is an error. The output of the counter 
goes to a selector latch only if the corresponding microcomputer 
module is the one at fault. The network is shown in Figure 10. 

The DMA controller has several functions. It is responsible 
for 1) creating the correct RAM addresses, 2) sending the READ 
command out to a correct microcomputer's memory, 5) receiving 
the data into a buffer, 4) broadcasting the VEITS command to all 
memories, 5) checking for last address, and 6) incrementing the 
address counter. The basis part of the DMA controller is shov/n 
in Figure 14. 
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Error Detection 
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In this section we want to identify the point at which 
we detect errors. In the IhR configuration, the error detector 
is constantly observing data transfers of the three microcomputer 
modules. Lets look at the first data line D(0) of the three 
modules that make of the TMR por-feion of the network. Table 2 
is a list of the eight possible states of this data line. If 
the error detector receives 000 or 111, then it is assummed that 
all modules are performing as required and there are no faults. 

If the results ira eixher Oil or 100, then it is assummed that 
the first module is’at fault. A "Oil" would indicate that modules 
2 and 5 are reporting a '1' and module 1 is reporting a 'O'. 

On the other hand, if "100" is recorded, then module 1 is 
reporting a '1' while modules 2 and 5 are reporting a 'O'. 

Using the same argument, we can see that "101" and "010" would 
indicate a fault on module 2 and a "110" or a "001" would 
indicate a fault on module 5. 

One method of detecting these errors is with the exclusive- 
or tree. However, the codes above could also be entered into a 
decoder. The advantage of using a decoder is obvious. If lines 0 
7 of the decoder is activated, then there is no fault. If lines 3 
or 4 are activated, microcomputer module ?;1 is at fault, i’-iodule 
ir2 is at fault when line 2 or 5 becomes active, and module is at 
fault vrhen line 1 or 6 is active. This portion of the error detection 
network is shown in Figure 11 as the check decoders. 

V/e can use the same decoders to determine the correct output. 
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Figure 11. Error Detect & Identification. 
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By merely connecting the output of the decoder to a different ’ 1 

network (Figure 13) the correct value can he obtained. This ] 

may be necessary in situations where we can not stop the processors | 
to make a switch. By observing Figure 13 we find th'at what is required'! 
is to take lines 3 » 5, 6, and 7 to a four-input nand gate. If the 
output is to voted to be a '1' then one of these four lines will be I 

I 

low causing the output to be a logical '1*. I 

There is also a need for this error detector to be self- 
checking. In observing FIGURE 11 we see that whethere there is -a 
fault or not, the nand gates must maintain a condition in v;hich 
there is only one logical M'. The most probable fault is that 
the decoder does not decode, that is, all output remain high. 

Of course, another possible fault is that tv/o or more lines are 
decoded at the same time. In either case, circuitry can be added 
to detect these failure conditions. The simpliest condition to 
detect is all output remaining high. This can also be done with 
a single 4~input gate. 
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Error Reporting 


I'he scheme for detecting faults on one line is shown in 
Figure 1 1 . The first problem is to determine how many lines of the 
bus we want to check. It was decided that the checking would be 

done over the address and data lines only. This decision was 

'i 

reached by considering the cabling width and cable count of the 
hardware that we used. In an actual system, the cabling size 
would not be a determining factor. Reliability would have highest 
priority . 

The output of Hand gates 1 through 3 of Figure 11 is fed to 
the input of Figure 12 such that, the signal line EDG1 represents 
the error status of TMR module-1 , EDGS2 represents the error statu 
of TMR module-2, and EDGS5 represents the error status of TMR 
module-5. These three signals are latched to preserve the error 
information . 

The output of Nand gate 0 of Figure 1 1 is fed into the upper 
network of Figure 12 to determine if a fault has occurred or not. 
The output of this network is fed into a one-shot to produce a 
latch pulse for the error detector network. 

The next problem is to determine when we should look for an 
error. Since the address and data lines are to be checked, it is 
obvious that we must check whenever addresses or data is to be 
transferred over the bus. Observing Figures 4, 5, 6a, 6b and 7 
we see that addresses are valid whenever ALE is high and data 
is valid when either WRITE, READ, or INTERRUPT ACKNOWLEDGE is 
.alid. We use this information (in TRM form) to fire the one- 



EIDL"\ 







Figure 14. DMA Controller for Memory Update. 












35 


shot vrhich will produce a latch pulse if there is a fault. 

One prohlem tht should be solved is that of knowing 
which control lines IMS, READ, or ViRICE) to use to generate 
tne timing pulses for the error checking. Under single faul* 
assumption, the failure is assumea to be aasoolated with the 
address or data bus. V/e can therefore use the circuit of 
Figure 15 to concentrate the control line signals. 
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Memory Exchange 

The exchange of data between a good processor's memory and 
a new processor to become a part of the TMR configuration is 
handled in four steps. The first step -is to store the present 
state of the processors. This is accomplished by using a vector 
interrupt and the algorithm of Figure 16a. The second step is 
select a correct memory to read from. The third step is to use 
a DMA controller to read data into a buffer, one daxum at a time, 
and to t/rite the contents of that buffer to all memories. The 
fourth step is to restor the state of the processors by using the 
algorithm of Figure I6b . 

A machine to select a correct memory to read from can be designed 
by using the state diagram approach. The state diagram is shown in 
Figure 17. The input to the state diagram is taken from the Error 
ID Latch (EIDL) of Figure 12. If EIDL is (ill), then there is no 
error and the machine will remain in state Sq . If EIDL is Oil, 
then the first micro.computer module has failed. Microcomputer module 
2 or 5 can be use as the correct memory. The machine will go to 
state S| . The output associated with state S| must indicate that 
memory 2 or 3 are available to be copied. We chose 2 to be specific. 
Lets suppose that EIDL has 101 while the machine is in state Sq. 
Microcomputer module 2 has failed. The machine will go to state 
Memories 1 and 5 are available for coping. Since we must chose 
one of these, we select the first one to copy. The selection of 
memories continues until there is two or less processors in the 

system. Presently, the machine is designed to go to a HALT state. 
H§?/eV%r , more work can by dQt'n^ on this proji-ect t© live mh^hine 
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Pigure l6a. Software to Store the State of Processors. 
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Pigure 16b. Software to Restore State of Processors. 
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degrade more slowly. On;i final note on the machine. The bottom 
part of each state ind|icate the status of the processors. Where 

f 

there is a 0 in a position, this represents an inactive processor. 
The po.sitions with a ) indicate a good processor and a position 

with an underlined 1/ indicates an active processor. 

/ 

The machine is /implemented in ROM. Figure 18 indicate the 
essential locations and their content. The Address Field is 
partitioned into two parts, the input from Figure 12 and the present 
state. The Content Field is also partitioned into two parts, the 
encoded output and the next state. Figure 19 is a physical realiza- 
tion of the I'lemory Select RCI'i network. 

The DMA controller consist of a small counter and decoder 
to generate DMA microcommands and four hexadecimal counters to 
generate the addresses for the memory exchange. The hexadecimal 
counters are concatenated such that the carry out of one stage 
becomes the clock-in of the next stage. The counters can be 
parallel loaded to start at any address. The DMA flip-flop 
is controlled by external logic and is reset by the Last Address 
Gate when the cycle is complete. The generated ALE, READ, and 
WRITE signals are produced by the decoder. First the ALE signal 
is generated to indicate that the addresses are valid. Then the 
READ command is issued to only one processor as specified by the 
ROM network. At this point data is transferred into a buffer, 
i’iext, the ALE signal is again generated and a WRITE command is 
sent to all processors. Finally, the Cycle Complete signal is 
generated. The controller is finished when a Cycle Comolete 
signal and a Last Address signal is received. 
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ADDRESS 

CONTENT 

ADDRESS 

CONTENT 

IN 

P.S. 

OUT 

N.S. 

IN 

P.S. 

OUT 

K . S . 

1 11 

00000 

000 

00000 

1 10 

001 1 1 

01 1 

1 0000 

01 1 

00000 

010 

00001 

01 1 

01000 

100 

01101 

101 

00000 

001 

00010 

101 

01000 

001 

1 0000 

110 

00000 

001 

0001 1 

110 

01 000 

001 

10000 

01 1 

00001 

010 

00100 

01 1 

01001 

010 

01 100 

101 

00001 

01 1 

00101 

101 

01001 

001 

10000 

110 

00001 

010 

001 10 

110 

01001 

001 

01111 

001 

00010 

01 1 

00101 

01 1 

01010 

1 1 1 

10000 

110 

00010 

001 

01000 

01 1 

01100 

1 1 1 

10000 

01 1 

0001 1 

010 

001 10 

01 1 

01101 

1 1 1 

10000 

101 

0001 1 

001 

01000 

01 1 

01 1 10 

1 1 1 

1 0000 

1 10 

0001 1 

001 

01001 

01 1 

01111 

1 1 1 

10000 

01 1 

00100 

010 

01010 

101 

01010 

111 

1 0000 

101 

00100 

01 1 

0101 1 

101 

01 01 1 

111 

10000 

1 10 

00100 

010 

01 100 

101 

01 100 

1 1 1 

10000 

01 1 

00101 

01 1 

0101 1 

101 

01 101 

111 

10000 

101 

00101 

01 1 

0101 1 

101 

01 1 10 

1 1 1 

10000 

110 

00101 

100 

01 101 

101 

01111 

1 1 1 

10000 

01 1 

001 10 

010 

01 100 

110 

01010 

1 1 1 

10000 

101 

001 10 

100 

01 101 

110 

0101 1 

111 

10000 

110 

00110 

010 

01 100 

1 10 

01 100 

1 1 1 

10000 

01 1 

001 1 1 

01 1 

0101 1 

110 

01 101 

1 1 1 

10000 

101 

001 1 1 

001 

01 1 1 0 

110 

01 1 10 

1 1 1 

1 0000 





110 
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1 1 1 

10000 
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This project vas started during a period when 6-hit micro- 
processors were the state of art. Now 16-bit and 52-bit micro- 
processor are almost common place. During this period microcomputers 
on a single chip are also common but they have little memory avail- 
able on the chip. The methodology presented in this report is com- 
patible with any microprocessor based computer system. However, if 
there were more test points available, the job of increasing relia- 
bility would be much easier. 

Typical microprocessors have 40 pins available to mhe user. 

This means that little more than addresses and data are available. 

By placing the clock circuit inside the processor, almost no checking 
can be done on the timing circuit which is required to synchronize 
a THR system. A tightly coupled real-time system is required for 
efficient aerospace comprxters. Therefore, the system designer must 
make use of every test-point available. 

There are a number of different approaches. We can triplicate 
memories, add software checks or insert softv^are breakpoints. But, 
the most efficient system is one in v/hich testing is transparent to 
the processing of the system. This report presents such an approach. 

Synchronization has been accomplished by using a single crystal. 
As the the number of processors are increased, the frequency decreases 
The decreases are generally less than 100 hertzs per processor. 

This is a minimum decrease in performance, and a significant reduction 
in part count for this function. 

Instead of using voters, our approach uses logic decoders for 
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detecting an error and identifying xhe failed module. Again the 
part count is reduced by getting two functions performed for the 
price of one. 

2he design also uses a ROM as a circuiu element to select 
a good memory to copy when an error occurs. The alternative was to 
use a triplicated memory. This technique has several short commings. 
First, if one of the memory modules is lost due to the first 
failure, then the second failure takes the system down. Second, 
a system that is to be designed around one of the microcomputers- 
on-a-chip cannot use this technique. 

Overall, the techniques examined during this report will aid 
in the design of a high-speed aero-space computer irregardless of 
whether that system is designed around 8, 16, or 32 bit micro- 
processors or microcomputer-on-a-chip . 




